为了应对复杂的照明环境中的车辆重新识别(RE-ID)的挑战,由于其出色的互补优势,因此考虑了多光谱来源,例如可见和红外信息。然而,多光谱的车辆重新ID遭受了由不同模态的异质特性以及各种身份不同视图的各种外观的巨大挑战引起的交叉模式差异。同时,各种环境干扰会导致每种方式中的样本分布差异很大。在这项工作中,我们提出了一个新型的跨方向一致性网络,以同时克服与模式和样本方面的差异。特别是,我们设计了一个新的跨方向中心损失,以将每个身份的模态中心拉动接近减轻的跨模式差异,而每个身份的样本中心接近减轻样品差异。这种策略可以为车辆重新ID生成歧视性的多光谱特征表示。此外,我们设计一个自适应层归一化单元,以动态调整个体特征分布以处理稳健学习的模式内特征的分布差异。为了提供一个全面的评估平​​台,我们创建了高质量的RGB-NIR TIR多光谱车辆重新ID基准(MSVR310),其中包括从广泛的观点,时间跨度和环境复杂性的310辆不同的车辆。对创建和公共数据集进行的全面实验证明了与最先进方法相比,提出的方法的有效性。
translated by 谷歌翻译
作为“进化计算研究中的新领域”,进化转移优化(ETO)将克服传统的零重复利用相关经验和知识的范式,这些范式在进化计算研究中解决了过去的问题。在通过ETO的计划申请中,可以为智能调度和绿色日程安排形成一个非常吸引人且高度竞争的框架“会议”,尤其是对于来自中国的“碳中立性”的誓言。据我们所知,当多目标优化问题“满足”离散案例中的单目标优化问题(而不是多任务优化)时,我们在此处安排的论文是一类ETO框架的第一项工作。更具体地说,可以通过新的核心转移机制和学习技巧来使用用于置换流程调度问题(PFSP)的新核心转移机制和学习技术,可以使用用于工业应用传达的关键知识,例如具有遗传算法的位置构建块。关于良好研究基准的广泛研究验证了我们提出的ETO-PFSP框架的企业有效性和巨大的普遍性。我们的调查(1)丰富了ETO框架,(2)有助于遗传算法和模因算法的基本基础的经典和基本理论,(3)(3)朝着通过范例和范式进行学习的范式进行进化调整的范式转移,中国“工业情报”的“基于知识和建筑块的计划”(KAB2S)。
translated by 谷歌翻译
进化转移优化(ETO)是“进化计算研究中的新领域”,该研究将避免在传统进化计算中解决的经验和知识零重复利用和知识。在通过ETO的计划申请中,它们之间的竞争性“会议”框架可能构成智能调度和绿色调度,尤其是在中国背景下的碳中立性。据我们所知,当多目标问题“满足”组合情况(而不是多任务优化)时,我们在此处进行的研究是ETO进行复杂优化的第一项。更具体地说,可以学习并转移置换流程调度问题(PFSP)之类的关键知识。对良好基准的实证研究证明了我们提出的ETO-PFSP框架的相对牢固的有效性和巨大的潜力。
translated by 谷歌翻译
单眼3D对象检测旨在将3D边界框本地化在输入单个2D图像中。这是一个非常具有挑战性的问题并且仍然是开放的,特别是当没有额外的信息时(例如,深度,激光雷达和/或多帧)可以利用训练和/或推理。本文提出了一种对单眼3D对象检测的简单而有效的配方,而无需利用任何额外信息。它介绍了从训练中学习单眼背景的单片方法,以帮助单目3D对象检测。关键的想法是,通过图像中的对象的注释3D边界框,在训练中有一个丰富的良好的投影2D监控信号,例如投影的角键点及其相关联的偏移向量相对于中心在2D边界框中,应该被开发为培训中的辅助任务。拟议的单一的单一的机动在衡量标准理论中的克拉默 - Wold定理在高水平下。在实施中,它利用非常简单的端到端设计来证明学习辅助单眼环境的有效性,它由三个组成组成:基于深度神经网络(DNN)的特征骨干,一些回归头部分支用于学习用于3D边界框预测的基本参数,以及用于学习辅助上下文的许多回归头分支。在训练之后,丢弃辅助上下文回归分支以获得更好的推理效率。在实验中,拟议的单一组在基蒂基准(汽车,Pedestrain和骑自行车的人)中测试。它超越了汽车类别上排行榜中的所有现有技术,并在准确性方面获得了行人和骑自行车者的可比性。由于简单的设计,所提出的单控制方法在比较中获得了38.7 FP的最快推断速度
translated by 谷歌翻译
稀疏张量最佳等级-1近似(BR1Approx)是密度张量BR1Approx的稀疏性概括,并且是稀疏矩阵BR1Approx的高阶扩展,是稀疏张量分解和相关问题的最重要问题之一从统计和机器学习。通过利用问题的多线性以及稀疏性结构,提出了四种近似算法,这些算法很容易实现,这些算法具有低计算复杂性,并且可以用作迭代算法的初始过程。另外,在所有算法上证明了理论上保证的最坏情况近似的下限。我们提供有关合成和真实数据的数值实验,以说明所提出的算法的有效性。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译